Maximum Entropy Summary Trees
نویسندگان
چکیده
Given a very large, node-weighted, rooted tree on, say, n nodes, if one has only enough space to display a knode summary of the tree, what is the most informative way to draw the tree? We define a type of weighted tree that we call a summary tree of the original tree that results from aggregating nodes of the original tree subject to certain constraints. We suggest that the best choice of which summary tree to use (among those with a fixed number of nodes) is the one that maximizes the information-theoretic entropy of a natural probability distribution associated with the summary tree, and we provide a (pseudopolynomial-time) dynamic-programming algorithm to compute this maximum entropy summary tree, when the weights are integral. The result is an automated way to summarize large trees and retain as much information about them as possible, while using (and displaying) only a fraction of the original node set. We illustrate the computation and use of maximum entropy summary trees on five real data sets whose weighted tree representations vary widely in structure. We also provide an additive approximation algorithm and a greedy heuristic that are faster than the optimal algorithm, and generalize to trees with real-valued weights.
منابع مشابه
Fast Algorithms for Constructing Maximum Entropy Summary Trees
Karloff and Shirley recently proposed “summary trees” as a new way to visualize large rooted trees (Eurovis 2013) and gave algorithms for generating a maximum-entropy k-node summary tree of an input n-node rooted tree. However, the algorithm generating optimal summary trees was only pseudo-polynomial (and worked only for integral weights); the authors left open existence of a polynomial-time al...
متن کاملEntropy based classification trees
One method for building classification trees is to choose split variables by maximising expected entropy. This can be extended through the application of imprecise probability by replacing instances of expected entropy with the maximum possible expected entropy over credal sets of probability distributions. Such methods may not take full advantage of the opportunities offered by imprecise proba...
متن کاملMaximum Entropy modeling for mwe identification. Comparison with decision trees
This report begins with a description of the maximum entropy framework commonly used in modeling natural language processing tasks. As far as our knowledge goes, this represents a first attempt to model multiword expression identification using a maximum entropy model. A related sort of models, loglinear models, have been previously used for automated identification of phrasal verbs (eg. look s...
متن کاملContext-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis
semi Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations of this decision tree-based structure: i) the decision tree structure lacks adequate context generalization; ii) it is unable to express complex context dependencies; iii) parameter...
متن کاملUnderstanding Privacy Risk of Publishing Decision Trees
Publishing decision trees can provide enormous benefits to the society. Meanwhile, it is widely believed that publishing decision trees can pose a potential risk to privacy. However, there is not much investigation on the privacy consequence of publishing decision trees. To understand this problem, we need to quantitatively measure privacy risk. Based on the well-established maximum entropy the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. Graph. Forum
دوره 32 شماره
صفحات -
تاریخ انتشار 2013